[USMP] Initial implementation of liveness analysis for Relax + TIR #250

gigiblender · 2022-09-15T12:02:29Z

This PR adds an initial implementation of liveness analysis of tensors/buffers for Relax and TIR programs.

@areusch @mbaret @YuchenJin @mikepapadim

YuchenJin · 2022-09-21T18:23:38Z

Thanks @gigiblender for integrating USMP into Relax!

One idea about the liveness analysis pass: we can have a memory lifting pass which lifts the memory allocations in TIR into Relax first, and this will allow the liveness analysis pass to analyze only the Relax functions without the need to analyze the TIR Primfuncs in the IRModule. Would love to hear your thoughts. 😄

And one suggestion for the test case construction, we encourage developers to use the block_builder and emit_te api to construct the IRModule if the TVMScript is very long, for example: https://github.com/tlc-pack/relax/blob/relax/tests/python/relax/test_transform_fuse_ops.py#L51-L57. This will make the test case more concise.

areusch · 2022-09-21T20:12:17Z

thanks @YuchenJin !

One idea about the liveness analysis pass: we can have a memory lifting pass which lifts the memory allocations in TIR into Relax first, and this will allow the liveness analysis pass to analyze only the Relax functions without the need to analyze the TIR Primfuncs in the IRModule. Would love to hear your thoughts. 😄

one challenge we have with lifting allocs is that if a TIR PrimFunc has two internal allocs which don't overlap, then we wouldn't be able to detect that solely by looking at Call(relax.builtin.alloc_tensor. However, I think that we might want to iterate on this PR to derive liveness based on first/last usage rather than just alloc nodes, so maybe this is less of a concern.

YuchenJin · 2022-09-22T01:37:52Z

one challenge we have with lifting allocs is that if a TIR PrimFunc has two internal allocs which don't overlap, then we wouldn't be able to detect that solely by looking at Call(relax.builtin.alloc_tensor.

Thanks @areusch! If we run the MetaSchedule tuning pass or other transformations/schedules first (which is usually the case since memory planning is at the later stage of the compilation), the temporary allocs inside TIR PrimFunc will get removed, so usually there will not be multiple temporary alloc in a TIR PrimFunc. Would love to know the cases where there are several temporary allocs.

areusch · 2022-09-22T16:13:53Z

hm, i was thinking that you would see this case when doing multi-anchor fusion. I haven't explored that enough yet to know, though. it does seem like there isn't anything in TIR preventing this case from happening though, and if folks are writing custom TIR passes, it might not be sufficient to rely on MetaSchedule to reuse Buffers in TIR. with that said, this might not be as high of a priority if MetaSchedule does do this.

I'm not sure resolving this question changes the approach of modifying the LivenessAnalysis to generate alloc/kill events based on usage. However, it's certainly a good thing to understand further.

YuchenJin · 2022-09-23T18:37:59Z

hm, i was thinking that you would see this case when doing multi-anchor fusion. I haven't explored that enough yet to know, though. it does seem like there isn't anything in TIR preventing this case from happening though, and if folks are writing custom TIR passes, it might not be sufficient to rely on MetaSchedule to reuse Buffers in TIR. with that said, this might not be as high of a priority if MetaSchedule does do this.

I'm not sure resolving this question changes the approach of modifying the LivenessAnalysis to generate alloc/kill events based on usage. However, it's certainly a good thing to understand further.

Yes, I agree it does not change the general approach. My thought is if there are usually not multiple temporary allocs in a TIR PrimFunc, the liveness analysis pass would just need to traverse the Relax function after memory lifting, which would simplify the assumption and reduce the complexity of the liveness analysis pass by a lot. :)

Co-Authored-By: Yuchen Jin <[email protected]>

Co-authored-by: ZihengJiang <[email protected]>

* Implementation of call_dps. * Implementation of PackedFuncExpr. * Test CallDPS for TIR function. * Rename. * Add header and comments. * Update. * Address comments.

* Update AST. * ShapeOf. * ShapeOf. * Address comment.

* Add initial IRBuilder. * Add function output to irbuilder; update based on new AST. * Add call method; clean up bindings * Add test. * Add multifuction test * Move implementation to C++; infer shape and type * update op python hook * More tests and bug fix * Add comments. * Update shape/type inference. * Restructure code; add python type hint. * Cleanup code. * Rebase; address comments. * Add call intrinsic. * nits. * Remove call op. * Migrate scope to C++ using tvm::With. * Address naming. * Add GetBlocks API. * Unify EmitOutput APIs; add more comments. * Remove shape and type deduction code. * Also remove the shape/type attr interface. * Address comments. * Differentiate global and local function. * Reset counter after building func/block. * Rebase. * Remove shape infer builtin. * Return from void function as empty tuple. Co-authored-by: Michalis Papadimitriou <[email protected]>

* Copy jared's frontend * Remove some extraneous code + add TODOs * Skeleton AST * Added more skeleton AST, worked on parsing shape annotations. Something is wrong with span_to_span * Fix spans * Type annotations parsing correctly * some match_shape support * More bug fixes! Some stuff parses. Importing into tests is messed up. We probably need to restructure this code as well. * refactor parser and fill out more stubs * some parser tests * yolo dataflow * checkpoint for rebase * hook up AST * add inline TIR parsing * some cleanup * support call_packed parsing to ExternFunc call * remove stub ops * improve docstrings * address nits * support coercing tuples to ShapeExpr when possible for call_dps Co-authored-by: electriclilies <[email protected]>

* Shape and type deduction. * Fix header. * Add call attrs to the deduce signature. * Address comments. * Add DiagnosticContext to IRBuilder and inference signature. * Fix nits.

* Relax pretty printer initial prototype * call into TVMScriptPrinter for PrimFuncs * most round-trip tests pass * address comments * fix typo

…c-pack#9) * Relax pretty printer initial prototype * call into TVMScriptPrinter for PrimFuncs * most round-trip tests pass * address comments * implement relax.output syntax for dataflow block outputs * remove leftover comments * fix Var constructor on ShapeExpr annotation * fix DataflowVar as well

* Update MatchShape AST Node. * Update. * Update.

* Relax pretty printer initial prototype * call into TVMScriptPrinter for PrimFuncs * most round-trip tests pass * address comments * implement relax.output syntax for dataflow block outputs * remove leftover comments * fix Var constructor on ShapeExpr annotation * add printing and parsing for simple PrimExpr and Call Attrs

* ExprVisitor/ExprMutator for relax nodes. * Update Visitor & Mutator. * Update Mutator. * DataflowMutator interface. * EwiseFMARewriter. * Update fma rewrite and add test. * Update test. * Fix dataflow block dispatching. * Construct new dataflow block with IRBuilder. * VisitBinding return void and mutate internal IRBuilder. * Simplify. * Update emit dataflow output. * Explicit memeory allocation rewrite. * LazyIRBuilder. * Update ExplicitMemMutator. * Overload IRBuilder::Emit to have 3 styles. * Update IRBuilder/IRMutator interfaces and passes. * Add MatchShape binding to IRBuilder. * Improve IRMutator interface; add Normalize and CanProveShapeEqual to IRBuilder * Update EmitMatchShape. Co-authored-by: ZihengJiang <[email protected]>

)

…ort for call_dps (tlc-pack#15) * update parser and printer for match_shape * support parsing class to IRModule, and extern func in call_dps

…rint IRModule PrimFuncs (tlc-pack#17)

* [PASS] Shape lowering. * Update to IRModule based. * TIR function generation. * Improve. * Improve. * Improve test. * Improve. * Address comment.

…lc-pack#19) * relax call_packed arity, return IRModule factory, print IRModule PrimFuncs * explicitly parse and print attrs_type_key on calls * print type even when attrs has no fields

* VM compiler. * Update. * Compile IRmodule; expose Python api * Add dtype contant serialization and type hint. * Address comments. * Add todos and fix lint. * Update * Update.

* init * update * update * test case working * update and add multi block test case * check in * fixes * fix * update * add * update * add * update * address comments. Co-authored-by: Altan Haan <[email protected]>

…ble (tlc-pack#21) * rebase. * Update. * Update shape lowering, make sure the lowering pipeline works. * Address comment.

* call_dps lowering. * Improve shape lowering. * Support alloc_storage for dynamic shape. * implementt ToNonDF to transform program to non-dataflow format. * Fix the mutator issue. * Update build api, an issue occurred. * vm tests can pass. * Support shape tuple in executable seriablization. * Fix for test. * Minor fixes. * Address comments. * Add mutate binding var back. * Visit binding var and fix tests. Co-authored-by: YuchenJin <[email protected]>

It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions. This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.

Fix an incorrect check which disables emitting global MatchShape outside a dataflow block and mistakenly enables emitting dataflow MatchShape outside a dataflow block.

…-pack#247) This PR makes some small additions to the end-to-end AutoTIR script, namely eliminating a bug (it was incorrectly using the stateful API) and adding an option to save the test results as a CSV file for benchmarking purposes (the data can then be separately analyzed as needed). These changes also required a small extension to the save_function method in the VM, namely allowing it to take keyword arguments.

Attempting to use `dump_ast` on functions containing the operators `relax.unique` and `relax.print` previously crashed due to being unable to query their attributes' keys. It turned out that this was a problem with the operator attributes: They had not been registered on the Python side, so Python representation treated them as opaque TVM objects. This PR corrects this mistake.

…ck#254) This small PR changes a check in the tvmscript parser to support empty shape tuples which are used to represent scalars. I added a scalar addition test to make sure it works properly.

…ack#257) It was observed that closures saved using `save_function` would crash when used over RPC with the `time_evaluator`, whereas using `set_input` and `invoke_stateful` worked as normal. While I am not entirely sure why these failures happened over RPC only in `time_evaluator` (but not in other RPC trials), it became clear that `set_input` performs a conversion of input tensor values in `SetInputTensorWithIndex`, while `save_function` was not doing this. Adding this conversion fixed the observed bug.

This PR adds a `ret_shape` field for specifying the shape of the function's return value. At present, we will not use this information, but by adding it into the AST, we will be able to parse the return shape and use it in the future. Parser V1 in this PR will just always list the `ret_shape` as `RuntimeDepShape`.

Previously, analyses to gather up all variables, free variables, bound variables, all global variables, and all global variables that are called had been implemented in C++ but had not been exposed in Python or tested. This PR exposes these analyses and adds tests for them. Two further changes: * The analyses previously ignored variables bound in `MatchShape` nodes; these are now treated as bindings too. * `rec_global_vars` is renamed `called_global_vars`, since the analysis itself does not check recursion.

* Support Function and If in Normalize pass. * Use structural equality for expr_memo_. * Change back to pointer equality for expr_memo_; Add more tests. * rebase.

It was brought up that Relay lacks an assert operator, so we may as well have one in Relax for debugging. One issue is that we can't name it "`assert`" because Python will treat it as a syntax error to have it as a field name for the "`relax`" module, i.e., `relax.assert` is a syntax error. Thus the op is named "`assert_op`," which is not ideal but serves its purpose.

[TVMScript] B4: If branch support (tlc-pack#263) B8: Local Function Support (tlc-pack#258) [TVMScript] B3: Type annotation checks (tlc-pack#256) [TVMScript][Parser] B1: Dataflow block (tlc-pack#252) [TVMScript] B2: match shape support (tlc-pack#251) [TVMScript] B6/B7: Symbolic shape and var shadowing (tlc-pack#245) [TVMScript] B5: Support relax op (tlc-pack#244) [TVMScript] B0: Call_tir support (tlc-pack#243) enhance parser error reporting (tlc-pack#242) [TVMScript] A1: Relax Parser infra (tlc-pack#240) update ci image versions. (tlc-pack#241) [TVMScript] B2-4: TIR IRBuilder (tlc-pack#239) [TVMScript] A0: Relax IRBuilder infra (tlc-pack#235) [TVMScript] B5-6: TIR IRBuilder (tlc-pack#231) [TVMScript] B1: IRBuilder (tlc-pack#228) [TVMScript] New Parser: Part C (tlc-pack#218) [TVMScript] New Parser: Part A (tlc-pack#221) [TVMScript] New Parser: Part B (tlc-pack#217) Not recovered: [Pass] Separate ApplyHistoryBest from tuning passes (tlc-pack#226) [Bugfix] Couple of bug fixes to run TVM-gen code together with BYOC (tlc-pack#249) co-authored-by: Yuchen Jin <[email protected]> co-authored-by: Siyuan Feng <[email protected]> co-authored-by: Ruihang Lai <[email protected]>

mbaret

I think largely this is code from existing USMP and the new additions seem in-keeping with that code. Minor comments only - further debates on things like testing methodology are probably better had when we take this to main.

mbaret · 2022-10-04T22:51:22Z

python/tvm/relax/analysis/analysis.py

+
+    Returns
+    -------
+    Map<relax::Expr, BufferInfo>


Dict[relay.Expr, BufferInfo] to be more python-style

mbaret · 2022-10-04T22:52:49Z

src/relax/usmp/analysis/extract_buffer_info.cc

+/*!
+ * \file relax/usmp/analysis/extract_buffer_info.cc
+ *
+ * \brief This analysis pass consumes a TIR IRModule with a main function


Update comment to reflect the Relax/TIR module

mbaret · 2022-10-17T14:43:04Z

src/relax/usmp/analysis/extract_buffer_info.cc

+  ExprVisitor::VisitBinding_(binding);
+}
+
+static Integer CalculateRelaxExtentsSize(const DataType& dtype, const Array<PrimExpr>& extents) {


Is there a way we can consolidate this with the TIR version?

mbaret · 2022-10-17T15:00:58Z

My thought is if there are usually not multiple temporary allocs in a TIR PrimFunc, the liveness analysis pass would just need to traverse the Relax function after memory lifting, which would simplify the assumption and reduce the complexity of the liveness analysis pass by a lot. :)

Ethos-U is a motivator for this functionality as it doesn't use metaschedule but does have multiple allocates in a single prim func. Doing buffer consolidation on a per-primfunc basis will also be generally less efficient than doing it with global knowledge where the memory fragmentation pattern is known.

…k#226)

…th BYOC (tlc-pack#249)

Co-authored-by: YuchenJin <[email protected]>

This commit changes the behavior of the parser to allow type annotations, as suggested by the community. The current behavior: - Use the more refined type/shape between user annotated and deduced type/shape. The updated behavior: - Always use user annotations - Only checks if the type/shape is valid.

gigiblender force-pushed the usmp-live-analysis branch from b2cf067 to 8e23f91 Compare October 3, 2022 16:16

tqchen and others added 24 commits October 14, 2022 12:49

disable GH

bf03d1b

Relax Virtual Machine

4863b8e

Co-Authored-By: Yuchen Jin <[email protected]>

Relax AST (tlc-pack#2)

6dffcf7

Co-authored-by: ZihengJiang <[email protected]>

Implementation of CallDPS (tlc-pack#3)

448507d

* Implementation of call_dps. * Implementation of PackedFuncExpr. * Test CallDPS for TIR function. * Rename. * Add header and comments. * Update. * Address comments.

Update AST and Shape() implementation (tlc-pack#5)

8b06254

* Update AST. * ShapeOf. * ShapeOf. * Address comment.

Shape and type deduction (tlc-pack#7)

7ccf467

* Shape and type deduction. * Fix header. * Add call attrs to the deduce signature. * Address comments. * Add DiagnosticContext to IRBuilder and inference signature. * Fix nits.

Relax pretty printer (tlc-pack#8)

ac572d3

* Relax pretty printer initial prototype * call into TVMScriptPrinter for PrimFuncs * most round-trip tests pass * address comments * fix typo

Update MatchShape AST Node (tlc-pack#11)

b7a781d

* Update MatchShape AST Node. * Update. * Update.

[Parser][Printer] update parser and printer for match_shape (tlc-pack#13

f3c1d02

)

Reorganize source code. (tlc-pack#14)

f952881

[Parser][Printer] Add class -> IRModule parsing, and extern func supp…

4a6750a

…ort for call_dps (tlc-pack#15) * update parser and printer for match_shape * support parsing class to IRModule, and extern func in call_dps

[Parser][Printer] relax call_packed arity, return IRModule factory, p…

97192ce

…rint IRModule PrimFuncs (tlc-pack#17)

[PASS] Shape lowering (tlc-pack#16)

057416b

* [PASS] Shape lowering. * Update to IRModule based. * TIR function generation. * Improve. * Improve. * Improve test. * Improve. * Address comment.

[Parser][Printer] explicitly parse and print attrs_type_key in calls (t…

debc524

…lc-pack#19) * relax call_packed arity, return IRModule factory, print IRModule PrimFuncs * explicitly parse and print attrs_type_key on calls * print type even when attrs has no fields

VM compiler. (tlc-pack#18)

175b9da

* VM compiler. * Update. * Compile IRmodule; expose Python api * Add dtype contant serialization and type hint. * Address comments. * Add todos and fix lint. * Update * Update.

Add type hint. (tlc-pack#20)

4a1d3da

Redesign IRBuilder to BlockBuilder (tlc-pack#22)

9efbd96

* init * update * update * test case working * update and add multi block test case * check in * fixes * fix * update * add * update * add * update * address comments. Co-authored-by: Altan Haan <[email protected]>

End2End Lowering Stage2: Enable Lowering from ShapeExpr to VM Executa…

f5ff3bb

…ble (tlc-pack#21) * rebase. * Update. * Update shape lowering, make sure the lowering pipeline works. * Address comment.

slyubomirsky and others added 12 commits October 16, 2022 08:29

[BugFix] Enable emit global MatchShape (tlc-pack#246)

dade30d

Fix an incorrect check which disables emitting global MatchShape outside a dataflow block and mistakenly enables emitting dataflow MatchShape outside a dataflow block.

[Call TIR] Fix bug when invoking call_tir with scalar values. (tlc-pa…

39448c3

…ck#254) This small PR changes a check in the tvmscript parser to support empty shape tuples which are used to represent scalars. I added a scalar addition test to make sure it works properly.

[Pass] Support Function and If in Normalize pass. (tlc-pack#268)

b46ced7

* Support Function and If in Normalize pass. * Use structural equality for expr_memo_. * Change back to pointer equality for expr_memo_; Add more tests. * rebase.

Enable Hexagon tests

935a3dc

mbaret reviewed Oct 17, 2022

View reviewed changes

sunggg and others added 3 commits October 17, 2022 17:16

Recover: [Pass] Separate ApplyHistoryBest from tuning passes (tlc-pac…

83de3b2

…k#226)

Recover: [Bugfix] Couple of bug fixes to run TVM-gen code together wi…

2963787

…th BYOC (tlc-pack#249)

Reenable autotvm silencer; fix e2e_auto_tir.py; fix lint.

c6d6a06

Co-authored-by: YuchenJin <[email protected]>

YuchenJin force-pushed the relax branch from d0659c0 to c6d6a06 Compare October 18, 2022 19:10

gigiblender force-pushed the usmp-live-analysis branch 2 times, most recently from f466d20 to 7600f9c Compare October 19, 2022 10:01

gigiblender closed this Nov 3, 2022

gigiblender force-pushed the usmp-live-analysis branch from 7600f9c to 32a03f8 Compare November 3, 2022 12:38

gigiblender added 2 commits November 3, 2022 14:40

[USMP] Initial implementation of liveness analysis for Relax + TIR

4e3a3c2

[USMP] Implement AssignPoolInfo pass

9885a8e

gigiblender reopened this Nov 3, 2022

YuchenJin force-pushed the relax branch from 1c73696 to 304048c Compare November 18, 2022 20:04

YuchenJin force-pushed the relax branch from b380aed to 40855eb Compare January 14, 2023 19:07

junrushao force-pushed the relax branch 2 times, most recently from 1f84c7b to 3cbe967 Compare February 9, 2023 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[USMP] Initial implementation of liveness analysis for Relax + TIR #250

[USMP] Initial implementation of liveness analysis for Relax + TIR #250

gigiblender commented Sep 15, 2022 •

edited

Loading

YuchenJin commented Sep 21, 2022

areusch commented Sep 21, 2022

YuchenJin commented Sep 22, 2022

areusch commented Sep 22, 2022

YuchenJin commented Sep 23, 2022

mbaret left a comment

mbaret Oct 4, 2022

mbaret Oct 4, 2022

mbaret Oct 17, 2022

mbaret commented Oct 17, 2022

[USMP] Initial implementation of liveness analysis for Relax + TIR #250

Are you sure you want to change the base?

[USMP] Initial implementation of liveness analysis for Relax + TIR #250

Conversation

gigiblender commented Sep 15, 2022 • edited Loading

YuchenJin commented Sep 21, 2022

areusch commented Sep 21, 2022

YuchenJin commented Sep 22, 2022

areusch commented Sep 22, 2022

YuchenJin commented Sep 23, 2022

mbaret left a comment

Choose a reason for hiding this comment

mbaret Oct 4, 2022

Choose a reason for hiding this comment

mbaret Oct 4, 2022

Choose a reason for hiding this comment

mbaret Oct 17, 2022

Choose a reason for hiding this comment

mbaret commented Oct 17, 2022

gigiblender commented Sep 15, 2022 •

edited

Loading